Dataset statistics
| Number of variables | 10 |
|---|---|
| Number of observations | 10000 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 781.4 KiB |
| Average record size in memory | 80.0 B |
Variable types
| Numeric | 9 |
|---|---|
| Categorical | 1 |
Pregnancies is highly correlated with Diabetic | High correlation |
BMI is highly correlated with Diabetic | High correlation |
Age is highly correlated with Diabetic | High correlation |
Diabetic is highly correlated with Pregnancies and 2 other fields | High correlation |
BMI has unique values | Unique |
DiabetesPedigree has unique values | Unique |
Pregnancies has 2879 (28.8%) zeros | Zeros |
Reproduction
| Analysis started | 2022-01-21 18:58:31.140480 |
|---|---|
| Analysis finished | 2022-01-21 18:58:41.351529 |
| Duration | 10.21 seconds |
| Software version | pandas-profiling v3.1.1 |
| Download configuration | config.json |
PatientID
Real number (ℝ≥0)
| Distinct | 9959 |
|---|---|
| Distinct (%) | 99.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1502122.083 |
| Minimum | 1000038 |
|---|---|
| Maximum | 1999997 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 78.2 KiB |
Quantile statistics
| Minimum | 1000038 |
|---|---|
| 5-th percentile | 1052241.75 |
| Q1 | 1251672.25 |
| median | 1504394 |
| Q3 | 1754607.5 |
| 95-th percentile | 1951606.9 |
| Maximum | 1999997 |
| Range | 999959 |
| Interquartile range (IQR) | 502935.25 |
Descriptive statistics
| Standard deviation | 289286.7648 |
|---|---|
| Coefficient of variation (CV) | 0.1925853885 |
| Kurtosis | -1.199849714 |
| Mean | 1502122.083 |
| Median Absolute Deviation (MAD) | 251150.5 |
| Skewness | -0.00477009371 |
| Sum | 1.502122083 × 1010 |
| Variance | 8.368683229 × 1010 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1623043 | 2 | < 0.1% |
| 1819876 | 2 | < 0.1% |
| 1224602 | 2 | < 0.1% |
| 1772038 | 2 | < 0.1% |
| 1830191 | 2 | < 0.1% |
| 1455760 | 2 | < 0.1% |
| 1109455 | 2 | < 0.1% |
| 1541930 | 2 | < 0.1% |
| 1407053 | 2 | < 0.1% |
| 1184651 | 2 | < 0.1% |
| Other values (9949) | 9980 |
| Value | Count | Frequency (%) |
| 1000038 | 1 | |
| 1000183 | 1 | |
| 1000326 | 1 | |
| 1000340 | 1 | |
| 1000471 | 1 | |
| 1000510 | 1 | |
| 1000652 | 1 | |
| 1000869 | 1 | |
| 1000963 | 1 | |
| 1001229 | 1 |
| Value | Count | Frequency (%) |
| 1999997 | 1 | |
| 1999864 | 1 | |
| 1999836 | 1 | |
| 1999319 | 1 | |
| 1999250 | 1 | |
| 1999214 | 1 | |
| 1999201 | 1 | |
| 1999183 | 1 | |
| 1998989 | 1 | |
| 1998962 | 1 |
| Distinct | 15 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.2558 |
| Minimum | 0 |
|---|---|
| Maximum | 14 |
| Zeros | 2879 |
| Zeros (%) | 28.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 78.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 2 |
| Q3 | 6 |
| 95-th percentile | 9 |
| Maximum | 14 |
| Range | 14 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 3.405719638 |
|---|---|
| Coefficient of variation (CV) | 1.046046943 |
| Kurtosis | -0.5290873777 |
| Mean | 3.2558 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 0.8107787892 |
| Sum | 32558 |
| Variance | 11.59892625 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=15)
| Value | Count | Frequency (%) |
| 0 | 2879 | |
| 1 | 1932 | |
| 3 | 784 | 7.8% |
| 6 | 720 | 7.2% |
| 2 | 623 | 6.2% |
| 7 | 606 | 6.1% |
| 9 | 602 | 6.0% |
| 5 | 456 | 4.6% |
| 4 | 454 | 4.5% |
| 8 | 446 | 4.5% |
| Other values (5) | 498 | 5.0% |
| Value | Count | Frequency (%) |
| 0 | 2879 | |
| 1 | 1932 | |
| 2 | 623 | 6.2% |
| 3 | 784 | 7.8% |
| 4 | 454 | 4.5% |
| 5 | 456 | 4.6% |
| 6 | 720 | 7.2% |
| 7 | 606 | 6.1% |
| 8 | 446 | 4.5% |
| 9 | 602 | 6.0% |
| Value | Count | Frequency (%) |
| 14 | 21 | 0.2% |
| 13 | 49 | 0.5% |
| 12 | 43 | 0.4% |
| 11 | 87 | 0.9% |
| 10 | 298 | |
| 9 | 602 | |
| 8 | 446 | |
| 7 | 606 | |
| 6 | 720 | |
| 5 | 456 |
PlasmaGlucose
Real number (ℝ≥0)
| Distinct | 149 |
|---|---|
| Distinct (%) | 1.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 107.8502 |
| Minimum | 44 |
|---|---|
| Maximum | 192 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 78.2 KiB |
Quantile statistics
| Minimum | 44 |
|---|---|
| 5-th percentile | 57 |
| Q1 | 84 |
| median | 105 |
| Q3 | 129 |
| 95-th percentile | 168 |
| Maximum | 192 |
| Range | 148 |
| Interquartile range (IQR) | 45 |
Descriptive statistics
| Standard deviation | 31.92090936 |
|---|---|
| Coefficient of variation (CV) | 0.2959745032 |
| Kurtosis | -0.5353944505 |
| Mean | 107.8502 |
| Median Absolute Deviation (MAD) | 22 |
| Skewness | 0.3260878604 |
| Sum | 1078502 |
| Variance | 1018.944454 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 97 | 171 | 1.7% |
| 96 | 164 | 1.6% |
| 118 | 146 | 1.5% |
| 107 | 141 | 1.4% |
| 119 | 134 | 1.3% |
| 101 | 131 | 1.3% |
| 85 | 131 | 1.3% |
| 95 | 131 | 1.3% |
| 89 | 130 | 1.3% |
| 116 | 127 | 1.3% |
| Other values (139) | 8594 |
| Value | Count | Frequency (%) |
| 44 | 14 | 0.1% |
| 45 | 29 | |
| 46 | 17 | 0.2% |
| 47 | 23 | 0.2% |
| 48 | 32 | |
| 49 | 14 | 0.1% |
| 50 | 25 | |
| 51 | 37 | |
| 52 | 60 | |
| 53 | 58 |
| Value | Count | Frequency (%) |
| 192 | 4 | < 0.1% |
| 191 | 5 | |
| 190 | 2 | < 0.1% |
| 189 | 4 | < 0.1% |
| 188 | 9 | |
| 187 | 8 | |
| 186 | 5 | |
| 185 | 8 | |
| 184 | 8 | |
| 183 | 11 |
DiastolicBloodPressure
Real number (ℝ≥0)
| Distinct | 90 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 71.2075 |
| Minimum | 24 |
|---|---|
| Maximum | 117 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 78.2 KiB |
Quantile statistics
| Minimum | 24 |
|---|---|
| 5-th percentile | 45 |
| Q1 | 58 |
| median | 72 |
| Q3 | 85 |
| 95-th percentile | 96 |
| Maximum | 117 |
| Range | 93 |
| Interquartile range (IQR) | 27 |
Descriptive statistics
| Standard deviation | 16.80147829 |
|---|---|
| Coefficient of variation (CV) | 0.2359509643 |
| Kurtosis | -0.8243769924 |
| Mean | 71.2075 |
| Median Absolute Deviation (MAD) | 13 |
| Skewness | -0.1056225452 |
| Sum | 712075 |
| Variance | 282.2896727 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 81 | 290 | 2.9% |
| 79 | 273 | 2.7% |
| 78 | 271 | 2.7% |
| 83 | 271 | 2.7% |
| 86 | 264 | 2.6% |
| 84 | 259 | 2.6% |
| 87 | 252 | 2.5% |
| 60 | 247 | 2.5% |
| 85 | 247 | 2.5% |
| 80 | 246 | 2.5% |
| Other values (80) | 7380 |
| Value | Count | Frequency (%) |
| 24 | 16 | |
| 25 | 10 | |
| 26 | 8 | |
| 27 | 10 | |
| 28 | 8 | |
| 29 | 3 | < 0.1% |
| 30 | 9 | |
| 31 | 5 | 0.1% |
| 32 | 5 | 0.1% |
| 33 | 3 | < 0.1% |
| Value | Count | Frequency (%) |
| 117 | 2 | < 0.1% |
| 116 | 7 | |
| 115 | 7 | |
| 114 | 8 | |
| 113 | 9 | |
| 112 | 1 | < 0.1% |
| 111 | 9 | |
| 110 | 2 | < 0.1% |
| 109 | 10 | |
| 108 | 12 |
TricepsThickness
Real number (ℝ≥0)
| Distinct | 66 |
|---|---|
| Distinct (%) | 0.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 28.8176 |
| Minimum | 7 |
|---|---|
| Maximum | 92 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 78.2 KiB |
Quantile statistics
| Minimum | 7 |
|---|---|
| 5-th percentile | 8 |
| Q1 | 15 |
| median | 31 |
| Q3 | 41 |
| 95-th percentile | 52 |
| Maximum | 92 |
| Range | 85 |
| Interquartile range (IQR) | 26 |
Descriptive statistics
| Standard deviation | 14.50648042 |
|---|---|
| Coefficient of variation (CV) | 0.5033896097 |
| Kurtosis | -0.7037530765 |
| Mean | 28.8176 |
| Median Absolute Deviation (MAD) | 12 |
| Skewness | 0.1636121117 |
| Sum | 288176 |
| Variance | 210.437974 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 11 | 433 | 4.3% |
| 10 | 385 | 3.9% |
| 9 | 383 | 3.8% |
| 34 | 359 | 3.6% |
| 7 | 352 | 3.5% |
| 8 | 336 | 3.4% |
| 45 | 336 | 3.4% |
| 35 | 325 | 3.2% |
| 42 | 323 | 3.2% |
| 44 | 321 | 3.2% |
| Other values (56) | 6447 |
| Value | Count | Frequency (%) |
| 7 | 352 | |
| 8 | 336 | |
| 9 | 383 | |
| 10 | 385 | |
| 11 | 433 | |
| 12 | 254 | |
| 13 | 136 | 1.4% |
| 14 | 150 | 1.5% |
| 15 | 243 | |
| 16 | 127 | 1.3% |
| Value | Count | Frequency (%) |
| 92 | 2 | |
| 91 | 2 | |
| 90 | 2 | |
| 89 | 3 | |
| 88 | 3 | |
| 86 | 4 | |
| 75 | 2 | |
| 74 | 4 | |
| 73 | 3 | |
| 72 | 4 |
SerumInsulin
Real number (ℝ≥0)
| Distinct | 620 |
|---|---|
| Distinct (%) | 6.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 139.2436 |
| Minimum | 14 |
|---|---|
| Maximum | 796 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 78.2 KiB |
Quantile statistics
| Minimum | 14 |
|---|---|
| 5-th percentile | 18 |
| Q1 | 39 |
| median | 85 |
| Q3 | 197 |
| 95-th percentile | 409 |
| Maximum | 796 |
| Range | 782 |
| Interquartile range (IQR) | 158 |
Descriptive statistics
| Standard deviation | 133.7779194 |
|---|---|
| Coefficient of variation (CV) | 0.9607473476 |
| Kurtosis | 3.567230944 |
| Mean | 139.2436 |
| Median Absolute Deviation (MAD) | 62 |
| Skewness | 1.741118017 |
| Sum | 1392436 |
| Variance | 17896.53171 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 28 | 115 | 1.1% |
| 23 | 114 | 1.1% |
| 16 | 114 | 1.1% |
| 47 | 111 | 1.1% |
| 27 | 111 | 1.1% |
| 32 | 108 | 1.1% |
| 44 | 107 | 1.1% |
| 43 | 105 | 1.1% |
| 14 | 105 | 1.1% |
| 46 | 104 | 1.0% |
| Other values (610) | 8906 |
| Value | Count | Frequency (%) |
| 14 | 105 | |
| 15 | 93 | |
| 16 | 114 | |
| 17 | 99 | |
| 18 | 90 | |
| 19 | 101 | |
| 20 | 86 | |
| 21 | 101 | |
| 22 | 91 | |
| 23 | 114 |
| Value | Count | Frequency (%) |
| 796 | 1 | |
| 795 | 1 | |
| 793 | 1 | |
| 787 | 1 | |
| 786 | 1 | |
| 783 | 1 | |
| 773 | 1 | |
| 767 | 1 | |
| 762 | 1 | |
| 758 | 1 |
| Distinct | 10000 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 31.56702174 |
| Minimum | 18.20080735 |
|---|---|
| Maximum | 56.03462763 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 78.2 KiB |
Quantile statistics
| Minimum | 18.20080735 |
|---|---|
| 5-th percentile | 18.93106087 |
| Q1 | 21.24742683 |
| median | 31.92242078 |
| Q3 | 39.32892145 |
| 95-th percentile | 47.10058666 |
| Maximum | 56.03462763 |
| Range | 37.83382028 |
| Interquartile range (IQR) | 18.08149461 |
Descriptive statistics
| Standard deviation | 9.804365694 |
|---|---|
| Coefficient of variation (CV) | 0.3105888726 |
| Kurtosis | -1.208882423 |
| Mean | 31.56702174 |
| Median Absolute Deviation (MAD) | 9.944330905 |
| Skewness | 0.188418633 |
| Sum | 315670.2174 |
| Variance | 96.12558665 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 43.50972593 | 1 | < 0.1% |
| 36.00097863 | 1 | < 0.1% |
| 25.54622503 | 1 | < 0.1% |
| 33.92169648 | 1 | < 0.1% |
| 48.1591365 | 1 | < 0.1% |
| 35.71809509 | 1 | < 0.1% |
| 21.81520611 | 1 | < 0.1% |
| 20.62707927 | 1 | < 0.1% |
| 36.12071674 | 1 | < 0.1% |
| 33.81964335 | 1 | < 0.1% |
| Other values (9990) | 9990 |
| Value | Count | Frequency (%) |
| 18.20080735 | 1 | |
| 18.20119302 | 1 | |
| 18.20322924 | 1 | |
| 18.20753772 | 1 | |
| 18.20976867 | 1 | |
| 18.21031909 | 1 | |
| 18.21032302 | 1 | |
| 18.21145072 | 1 | |
| 18.213945 | 1 | |
| 18.21428625 | 1 |
| Value | Count | Frequency (%) |
| 56.03462763 | 1 | |
| 55.94718308 | 1 | |
| 55.8662382 | 1 | |
| 55.85881276 | 1 | |
| 55.7064282 | 1 | |
| 55.62039267 | 1 | |
| 55.61299774 | 1 | |
| 55.58741669 | 1 | |
| 55.57960103 | 1 | |
| 55.53899604 | 1 |
| Distinct | 10000 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.4009437247 |
| Minimum | 0.078043795 |
|---|---|
| Maximum | 2.301594189 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 78.2 KiB |
Quantile statistics
| Minimum | 0.078043795 |
|---|---|
| 5-th percentile | 0.09052474235 |
| Q1 | 0.1370654103 |
| median | 0.199698294 |
| Q3 | 0.6211583733 |
| 95-th percentile | 1.146758208 |
| Maximum | 2.301594189 |
| Range | 2.223550394 |
| Interquartile range (IQR) | 0.484092963 |
Descriptive statistics
| Standard deviation | 0.38146316 |
|---|---|
| Coefficient of variation (CV) | 0.9514132195 |
| Kurtosis | 2.924154025 |
| Mean | 0.4009437247 |
| Median Absolute Deviation (MAD) | 0.091999566 |
| Skewness | 1.676804816 |
| Sum | 4009.437247 |
| Variance | 0.1455141424 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1.213191354 | 1 | < 0.1% |
| 0.944162422 | 1 | < 0.1% |
| 1.064307199 | 1 | < 0.1% |
| 0.169939639 | 1 | < 0.1% |
| 0.472797747 | 1 | < 0.1% |
| 0.736663397 | 1 | < 0.1% |
| 0.268128612 | 1 | < 0.1% |
| 0.091008464 | 1 | < 0.1% |
| 0.708723939 | 1 | < 0.1% |
| 0.187784517 | 1 | < 0.1% |
| Other values (9990) | 9990 |
| Value | Count | Frequency (%) |
| 0.078043795 | 1 | |
| 0.078082666 | 1 | |
| 0.078092648 | 1 | |
| 0.078107082 | 1 | |
| 0.078169565 | 1 | |
| 0.078176817 | 1 | |
| 0.078181344 | 1 | |
| 0.078236157 | 1 | |
| 0.078242371 | 1 | |
| 0.078250558 | 1 |
| Value | Count | Frequency (%) |
| 2.301594189 | 1 | |
| 2.291294242 | 1 | |
| 2.287388168 | 1 | |
| 2.285180184 | 1 | |
| 2.270415383 | 1 | |
| 2.267550416 | 1 | |
| 2.246609162 | 1 | |
| 2.245287697 | 1 | |
| 2.215815235 | 1 | |
| 2.204918601 | 1 |
| Distinct | 56 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 30.1341 |
| Minimum | 21 |
|---|---|
| Maximum | 77 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 78.2 KiB |
Quantile statistics
| Minimum | 21 |
|---|---|
| 5-th percentile | 21 |
| Q1 | 22 |
| median | 24 |
| Q3 | 35 |
| 95-th percentile | 57 |
| Maximum | 77 |
| Range | 56 |
| Interquartile range (IQR) | 13 |
Descriptive statistics
| Standard deviation | 12.10604695 |
|---|---|
| Coefficient of variation (CV) | 0.4017391245 |
| Kurtosis | 1.222442785 |
| Mean | 30.1341 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 1.484823135 |
| Sum | 301341 |
| Variance | 146.5563728 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 22 | 1692 | |
| 21 | 1670 | |
| 23 | 1332 | |
| 25 | 682 | 6.8% |
| 24 | 652 | 6.5% |
| 26 | 647 | 6.5% |
| 45 | 203 | 2.0% |
| 46 | 191 | 1.9% |
| 44 | 187 | 1.9% |
| 43 | 185 | 1.8% |
| Other values (46) | 2559 |
| Value | Count | Frequency (%) |
| 21 | 1670 | |
| 22 | 1692 | |
| 23 | 1332 | |
| 24 | 652 | 6.5% |
| 25 | 682 | |
| 26 | 647 | 6.5% |
| 28 | 23 | 0.2% |
| 29 | 37 | 0.4% |
| 30 | 136 | 1.4% |
| 31 | 125 | 1.2% |
| Value | Count | Frequency (%) |
| 77 | 3 | < 0.1% |
| 76 | 2 | < 0.1% |
| 75 | 5 | 0.1% |
| 74 | 3 | < 0.1% |
| 73 | 4 | < 0.1% |
| 72 | 3 | < 0.1% |
| 71 | 32 | |
| 70 | 23 | |
| 69 | 23 | |
| 68 | 14 |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 78.2 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 10000 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 1 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 6656 | |
| 1 | 3344 |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 0 | 6656 | |
| 1 | 3344 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 6656 | |
| 1 | 3344 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 10000 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 6656 | |
| 1 | 3344 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 10000 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 6656 | |
| 1 | 3344 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 10000 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 6656 | |
| 1 | 3344 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| PatientID | Pregnancies | PlasmaGlucose | DiastolicBloodPressure | TricepsThickness | SerumInsulin | BMI | DiabetesPedigree | Age | Diabetic | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1354778 | 0 | 171 | 80 | 34 | 23 | 43.509726 | 1.213191 | 21 | 0 |
| 1 | 1147438 | 8 | 92 | 93 | 47 | 36 | 21.240576 | 0.158365 | 23 | 0 |
| 2 | 1640031 | 7 | 115 | 47 | 52 | 35 | 41.511523 | 0.079019 | 23 | 0 |
| 3 | 1883350 | 9 | 103 | 78 | 25 | 304 | 29.582192 | 1.282870 | 43 | 1 |
| 4 | 1424119 | 1 | 85 | 59 | 27 | 35 | 42.604536 | 0.549542 | 22 | 0 |
| 5 | 1619297 | 0 | 82 | 92 | 9 | 253 | 19.724160 | 0.103424 | 26 | 0 |
| 6 | 1660149 | 0 | 133 | 47 | 19 | 227 | 21.941357 | 0.174160 | 21 | 0 |
| 7 | 1458769 | 0 | 67 | 87 | 43 | 36 | 18.277723 | 0.236165 | 26 | 0 |
| 8 | 1201647 | 8 | 80 | 95 | 33 | 24 | 26.624929 | 0.443947 | 53 | 1 |
| 9 | 1403912 | 1 | 72 | 31 | 40 | 42 | 36.889576 | 0.103944 | 26 | 0 |
Last rows
| PatientID | Pregnancies | PlasmaGlucose | DiastolicBloodPressure | TricepsThickness | SerumInsulin | BMI | DiabetesPedigree | Age | Diabetic | |
|---|---|---|---|---|---|---|---|---|---|---|
| 9990 | 1317550 | 0 | 86 | 80 | 10 | 36 | 43.389898 | 0.083597 | 21 | 0 |
| 9991 | 1819056 | 4 | 140 | 94 | 25 | 170 | 32.448878 | 0.108273 | 45 | 1 |
| 9992 | 1639966 | 4 | 100 | 83 | 34 | 49 | 26.273109 | 0.136661 | 42 | 1 |
| 9993 | 1006612 | 1 | 69 | 85 | 17 | 46 | 36.564569 | 0.139280 | 23 | 0 |
| 9994 | 1464564 | 0 | 84 | 39 | 35 | 37 | 41.443376 | 0.123610 | 26 | 0 |
| 9995 | 1469198 | 6 | 95 | 85 | 37 | 267 | 18.497542 | 0.660240 | 31 | 0 |
| 9996 | 1432736 | 0 | 55 | 51 | 7 | 50 | 21.865341 | 0.086589 | 34 | 0 |
| 9997 | 1410962 | 5 | 99 | 59 | 47 | 67 | 30.774018 | 2.301594 | 43 | 1 |
| 9998 | 1958653 | 0 | 145 | 67 | 30 | 21 | 18.811861 | 0.789572 | 26 | 0 |
| 9999 | 1332938 | 10 | 100 | 54 | 34 | 27 | 38.840943 | 0.175465 | 23 | 0 |